variation operator
Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations
Mitsides, Konstantinos, Faldor, Maxence, Cully, Antoine
Quality-Diversity optimization comprises a family of evolutionary algorithms aimed at generating a collection of diverse and high-performing solutions. MAP-Elites (ME), a notable example, is used effectively in fields like evolutionary robotics. However, the reliance of ME on random mutations from Genetic Algorithms limits its ability to evolve high-dimensional solutions. Methods proposed to overcome this include using gradient-based operators like policy gradients or natural evolution strategies. While successful at scaling ME for neuroevolution, these methods often suffer from slow training speeds, or difficulties in scaling with massive parallelization due to high computational demands or reliance on centralized actor-critic training. In this work, we introduce a fast, sample-efficient ME based algorithm capable of scaling up with massive parallelization, significantly reducing runtimes without compromising performance. Our method, ASCII-ME, unlike existing policy gradient quality-diversity methods, does not rely on centralized actor-critic training. It performs behavioral variations based on time step performance metrics and maps these variations to solutions using policy gradients. Our experiments show that ASCII-ME can generate a diverse collection of high-performing deep neural network policies in less than 250 seconds on a single GPU. Additionally, it operates on average, five times faster than state-of-the-art algorithms while still maintaining competitive sample efficiency.
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms
Designing optimization approaches, whether heuristic or meta-heuristic, usually demands extensive manual intervention and has difficulty generalizing across diverse problem domains. The combination of Large Language Models (LLMs) and Evolutionary Algorithms (EAs) offers a promising new approach to overcome these limitations and make optimization more automated. In this setup, LLMs act as dynamic agents that can generate, refine, and interpret optimization strategies, while EAs efficiently explore complex solution spaces through evolutionary operators. Since this synergy enables a more efficient and creative search process, we first conduct an extensive review of recent research on the application of LLMs in optimization. We focus on LLMs' dual functionality as solution generators and algorithm designers. Then, we summarize the common and valuable designs in existing work and propose a novel LLM-EA paradigm for automated optimization. Furthermore, centered on this paradigm, we conduct an in-depth analysis of innovative methods for three key components: individual representation, variation operators, and fitness evaluation. We address challenges related to heuristic generation and solution exploration, especially from the LLM prompts' perspective. Our systematic review and thorough analysis of the paradigm can assist researchers in better understanding the current research and promoting the development of combining LLMs with EAs for automated optimization.
Parametric-Task MAP-Elites
Anne, Timothée, Mouret, Jean-Baptiste
Optimizing a set of functions simultaneously by leveraging their similarity is called multi-task optimization. Current black-box multi-task algorithms only solve a finite set of tasks, even when the tasks originate from a continuous space. In this paper, we introduce Parametric-task MAP-Elites (PT-ME), a novel black-box algorithm to solve continuous multi-task optimization problems. This algorithm (1) solves a new task at each iteration, effectively covering the continuous space, and (2) exploits a new variation operator based on local linear regression. The resulting dataset of solutions makes it possible to create a function that maps any task parameter to its optimal solution. We show on two parametric-task toy problems and a more realistic and challenging robotic problem in simulation that PT-ME outperforms all baselines, including the deep reinforcement learning algorithm PPO.
Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Faldor, Maxence, Chalumeau, Félix, Flageat, Manon, Cully, Antoine
A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.
Improving the Data Efficiency of Multi-Objective Quality-Diversity through Gradient Assistance and Crowding Exploration
Janmohamed, Hannah, Pierrot, Thomas, Cully, Antoine
Quality-Diversity (QD) algorithms have recently gained traction as optimisation methods due to their effectiveness at escaping local optima and capability of generating wide-ranging and high-performing solutions. Recently, Multi-Objective MAP-Elites (MOME) extended the QD paradigm to the multi-objective setting by maintaining a Pareto front in each cell of a map-elites grid. MOME achieved a global performance that competed with NSGA-II and SPEA2, two well-established Multi-Objective Evolutionary Algorithms (MOEA), while also acquiring a diverse repertoire of solutions. However, MOME is limited by non-directed genetic search mechanisms which struggle in high-dimensional search spaces. In this work, we present Multi-Objective MAP-Elites with Policy-Gradient Assistance and Crowding-based Exploration (MOME-PGX): a new QD algorithm that extends MOME to improve its data efficiency and performance. MOME-PGX uses gradient-based optimisation to efficiently drive solutions towards higher performance. It also introduces crowding-based mechanisms to create an improved exploration strategy and to encourage uniformity across Pareto fronts. We evaluate MOME-PGX in four simulated robot locomotion tasks and demonstrate that it converges faster and to a higher performance than all other baselines. We show that MOME-PGX is between 4.3 and 42 times more data-efficient than MOME and doubles the performance of MOME, NSGA-II and SPEA2 in challenging environments.
Zoetrope Genetic Programming for Regression
Boisbunon, Aurélie, Fanara, Carlo, Grenet, Ingrid, Daeden, Jonathan, Vighi, Alexis, Schoenauer, Marc
The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions, targeting evolutionary symbolic regression.The zoetropic representation uses repeated fusion operations between partial expressions, starting from the terminal set. Repeated fusions within an individual gradually generate more complex expressions, ending up in what can be viewed as new features. These features are then linearly combined to best fit the training data. ZGP individuals then undergo specific crossover and mutation operators, and selection takes place between parents and offspring. ZGP is validated using a large number of public domain regression datasets, and compared to other symbolic regression algorithms, as well as to traditional machine learning algorithms. ZGP reaches state-of-the-art performance with respect to both types of algorithms, and demonstrates a low computational time compared to other symbolic regression approaches.
Pathnet is Deepmind's step to a super neural network for creating an artificial general intelligence
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function.
An Evolutionary Metaheuristic Based on State Decomposition for Domain-Independent Satisficing Planning
Bibaï, Jacques (Thales Research and Technology) | Savéant, Pierre (Thales Research and Technology) | Schoenauer, Marc (INRIA) | Vidal, Vincent (ONERA)
DAEX is a metaheuristic designed to improve the plan quality and the scalability of an encapsulated planning system. DAEX is based on a state decomposition strategy, driven by an evolutionary algorithm, which benefits from the use of a classical planning heuristic to maintain an ordering of atoms within the individuals. The proof of concept is achieved by embedding the domain-independent satisficing YAHSP planner and using the critical path h1 heuristic. Experiments with the resulting algorithm are performed on a selection of IPC benchmarks from classical, cost-based and temporal domains. Under the experimental conditions of the IPC, and in particular with a universal parameter setting common to all domains, DAEYAHSP is compared to the best planner for each type of domain. Results show that DAEYAHSP performs very well both on coverage and quality metrics. It is particularly noticeable that DAEX improves a lot on plan quality when compared to YAHSP, which is known to provide largely suboptimal solutions; making it competitive with state-of-the-art planners. This article gives a full account of the algorithm, reports on the experiments and provides some insights on the algorithm behavior.
Evolutionary Computing
Eiben, Aguston E., Schoenauer, Marc
Evolutionary computing (EC) is an exciting development in Computer Science. It amounts to building, applying and studying algorithms based on the Darwinian principles of natural selection. In this paper we briefly introduce the main concepts behind evolutionary computing. We present the main components all evolutionary algorithms (EA), sketch the differences between different types of EAs and survey application areas ranging from optimization, modeling and simulation to entertainment.